191 research outputs found
Building Morphological Chains for Agglutinative Languages
In this paper, we build morphological chains for agglutinative languages by
using a log-linear model for the morphological segmentation task. The model is
based on the unsupervised morphological segmentation system called
MorphoChains. We extend MorphoChains log linear model by expanding the
candidate space recursively to cover more split points for agglutinative
languages such as Turkish, whereas in the original model candidates are
generated by considering only binary segmentation of each word. The results
show that we improve the state-of-art Turkish scores by 12% having a F-measure
of 72% and we improve the English scores by 3% having a F-measure of 74%.
Eventually, the system outperforms both MorphoChains and other well-known
unsupervised morphological segmentation systems. The results indicate that
candidate generation plays an important role in such an unsupervised log-linear
model that is learned using contrastive estimation with negative samples.Comment: 10 pages, accepted and presented at the CICLing 2017 (18th
International Conference on Intelligent Text Processing and Computational
Linguistics
Methods and algorithms for unsupervised learning of morphology
This is an accepted manuscript of a chapter published by Springer in Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8403 in 2014 available online: https://doi.org/10.1007/978-3-642-54906-9_15
The accepted version of the publication may differ from the final published version.This paper is a survey of methods and algorithms for unsupervised learning of morphology. We provide a description of the methods and algorithms used for morphological segmentation from a computational linguistics point of view. We survey morphological segmentation methods covering methods based on MDL (minimum description length), MLE (maximum likelihood estimation), MAP (maximum a posteriori), parametric and non-parametric Bayesian approaches. A review of the evaluation schemes for unsupervised morphological segmentation is also provided along with a summary of evaluation results on the Morpho Challenge evaluations.Published versio
Genome-Wide Gene Expression Analysis Suggests an Important Role of Hypoxia in the Pathogenesis of Endemic Osteochondropathy Kashin-Beck Disease
Kashin-Beck Disease (KBD) is an endemic osteochondropathy, the pathogenesis of which remains unclear now. In this study, we compared gene expression profiles of articular cartilage derived respectively from KBD patients and normal controls. Total RNA were isolated, amplified, labeled and hybridized to Agilent human 1A 22 k whole genome microarray chip. qRT-PCR was conducted to validate our microarray data. We detected 57 up-regulated genes (ratios ≥2.0) and 24 down-regulated genes (ratios ≤0.5) in KBD cartilage. To further identify the key genes involved in the pathogenesis of KBD, Bayesian analysis of variance for microarrays(BAM) software was applied and identified 12 potential key genes with an average ratio 6.64, involved in apoptosis, metabolism, cytokine & growth factor and cytoskeleton & cell movement. Gene Set Enrichment Analysis (GSEA) software was used to identify differently expressed gene ontology categories and pathways. GSEA found that a set of apoptosis, hypoxia and mitochondrial function related gene ontology categories and pathways were significantly up-regulated in KBD compared to normal controls. Based on the results of this study, we suggest that chronic hypoxia-induced mitochondrial damage and apoptosis might play an important role in the pathogenesis of KBD. Our efforts may help to understand the pathogenesis of KBD as well as other osteoarthrosis with similar articular cartilage lesions
The Variational Garrote
In this paper, we present a new variational method for sparse regression
using regularization. The variational parameters appear in the
approximate model in a way that is similar to Breiman's Garrote model. We refer
to this method as the variational Garrote (VG). We show that the combination of
the variational approximation and regularization has the effect of making
the problem effectively of maximal rank even when the number of samples is
small compared to the number of variables. The VG is compared numerically with
the Lasso method, ridge regression and the recently introduced paired mean
field method (PMF) (M. Titsias & M. L\'azaro-Gredilla., NIPS 2012). Numerical
results show that the VG and PMF yield more accurate predictions and more
accurately reconstruct the true model than the other methods. It is shown that
the VG finds correct solutions when the Lasso solution is inconsistent due to
large input correlations. Globally, VG is significantly faster than PMF and
tends to perform better as the problems become denser and in problems with
strongly correlated inputs. The naive implementation of the VG scales cubic
with the number of features. By introducing Lagrange multipliers we obtain a
dual formulation of the problem that scales cubic in the number of samples, but
close to linear in the number of features.Comment: 26 pages, 11 figure
Satisfaction with web-based training in an integrated healthcare delivery network: do age, education, computer skills and attitudes matter?
<p>Abstract</p> <p>Background</p> <p>Healthcare institutions spend enormous time and effort to train their workforce. Web-based training can potentially streamline this process. However the deployment of web-based training in a large-scale setting with a diverse healthcare workforce has not been evaluated. The aim of this study was to evaluate the satisfaction of healthcare professionals with web-based training and to determine the predictors of such satisfaction including age, education status and computer proficiency.</p> <p>Methods</p> <p>Observational, cross-sectional survey of healthcare professionals from six hospital systems in an integrated delivery network. We measured overall satisfaction to web-based training and response to survey items measuring Website Usability, Course Usefulness, Instructional Design Effectiveness, Computer Proficiency and Self-learning Attitude.</p> <p>Results</p> <p>A total of 17,891 healthcare professionals completed the web-based training on HIPAA Privacy Rule; and of these, 13,537 completed the survey (response rate 75.6%). Overall course satisfaction was good (median, 4; scale, 1 to 5) with more than 75% of the respondents satisfied with the training (rating 4 or 5) and 65% preferring web-based training over traditional instructor-led training (rating 4 or 5). Multivariable ordinal regression revealed 3 key predictors of satisfaction with web-based training: Instructional Design Effectiveness, Website Usability and Course Usefulness. Demographic predictors such as gender, age and education did not have an effect on satisfaction.</p> <p>Conclusion</p> <p>The study shows that web-based training when tailored to learners' background, is perceived as a satisfactory mode of learning by an interdisciplinary group of healthcare professionals, irrespective of age, education level or prior computer experience. Future studies should aim to measure the long-term outcomes of web-based training.</p
Assessing the clinical utility of cancer genomic and proteomic data across tumor types
Molecular profiling of tumors promises to advance the clinical management of cancer, but the benefits of integrating molecular data with traditional clinical variables have not been systematically studied. Here we retrospectively predict patient survival using diverse molecular data (somatic copy-number alteration, DNA methylation and mRNA, miRNA and protein expression) from 953 samples of four cancer types from The Cancer Genome Atlas project. We found that incorporating molecular data with clinical variables yielded statistically significantly improved predictions (FDR < 0.05) for three cancers but those quantitative gains were limited (2.2–23.9%). Additional analyses revealed little predictive power across tumor types except for one case. In clinically relevant genes, we identified 10,281 somatic alterations across 12 cancer types in 2,928 of 3,277 patients (89.4%), many of which would not be revealed in single-tumor analyses. Our study provides a starting point and resources, including an open-access model evaluation platform, for building reliable prognostic and therapeutic strategies that incorporate molecular data
Blood cell gene expression associated with cellular stress defense is modulated by antioxidant-rich food in a randomised controlled clinical trial of male smokers
Background
Plant-based diets rich in fruit and vegetables can prevent development of several chronic age-related diseases. However, the mechanisms behind this protective effect are not elucidated. We have tested the hypothesis that intake of antioxidant-rich foods can affect groups of genes associated with cellular stress defence in human blood cells. Trial registration number: NCT00520819 http://clinicaltrials.gov.
Methods
In an 8-week dietary intervention study, 102 healthy male smokers were randomised to either a diet rich in various antioxidant-rich foods, a kiwifruit diet (three kiwifruits/d added to the regular diet) or a control group. Blood cell gene expression profiles were obtained from 10 randomly selected individuals of each group. Diet-induced changes on gene expression were compared to controls using a novel application of the gene set enrichment analysis (GSEA) on transcription profiles obtained using Affymetrix HG-U133-Plus 2.0 whole genome arrays.
Results
Changes were observed in the blood cell gene expression profiles in both intervention groups when compared to the control group. Groups of genes involved in regulation of cellular stress defence, such as DNA repair, apoptosis and hypoxia, were significantly upregulated (GSEA, FDR q-values < 5%) by both diets compared to the control group. Genes with common regulatory motifs for aryl hydrocarbon receptor (AhR) and AhR nuclear translocator (AhR/ARNT) were upregulated by both interventions (FDR q-values < 5%). Plasma antioxidant biomarkers (polyphenols/carotenoids) increased in both groups.
Conclusions
The observed changes in the blood cell gene expression profiles suggest that the beneficial effects of a plant-based diet on human health may be mediated through optimization of defence processes
A Seven-Marker Signature and Clinical Outcome in Malignant Melanoma: A Large-Scale Tissue-Microarray Study with Two Independent Patient Cohorts
Current staging methods such as tumor thickness, ulceration and invasion of the sentinel node are known to be prognostic parameters in patients with malignant melanoma (MM). However, predictive molecular marker profiles for risk stratification and therapy optimization are not yet available for routine clinical assessment.; Using tissue microarrays, we retrospectively analyzed samples from 364 patients with primary MM. We investigated a panel of 70 immunohistochemical (IHC) antibodies for cell cycle, apoptosis, DNA mismatch repair, differentiation, proliferation, cell adhesion, signaling and metabolism. A marker selection procedure based on univariate Cox regression and multiple testing correction was employed to correlate the IHC expression data with the clinical follow-up (overall and recurrence-free survival). The model was thoroughly evaluated with two different cross validation experiments, a permutation test and a multivariate Cox regression analysis. In addition, the predictive power of the identified marker signature was validated on a second independent external test cohort (n?=?225). A signature of seven biomarkers (Bax, Bcl-X, PTEN, COX-2, loss of ?-Catenin, loss of MTAP, and presence of CD20 positive B-lymphocytes) was found to be an independent negative predictor for overall and recurrence-free survival in patients with MM. The seven-marker signature could also predict a high risk of disease recurrence in patients with localized primary MM stage pT1-2 (tumor thickness ?2.00 mm). In particular, three of these markers (MTAP, COX-2, Bcl-X) were shown to offer direct therapeutic implications.; The seven-marker signature might serve as a prognostic tool enabling physicians to selectively triage, at the time of diagnosis, the subset of high recurrence risk stage I-II patients for adjuvant therapy. Selective treatment of those patients that are more likely to develop distant metastatic disease could potentially lower the burden of untreatable metastatic melanoma and revolutionize the therapeutic management of MM
- …